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The present operation of the ground-based network of gravitational-wave laser interferometers in 
"enhanced" configuration and the beginning of the construction of second-generation (or advanced) 
interferometers with planned observation runs beginning by 2015 bring the search for gravitational 
waves into a regime where detection is highly plausible. The development of techniques that allow 
us to discriminate a signal of astrophysical origin from instrumental artefacts in the interferometer 
data and to extract the full range of information are therefore some of the primary goals of the 
current work. Here we report the details of a Bayesian approach to the problem of inference for 
gravitational wave observations using a network (containing an arbitrary number) of instruments, for 
the computation of the Bayes factor between two hypotheses and the evaluation of the marginalised 
posterior density functions of the unknown model parameters. The numerical algorithm to tackle 
the notoriously difficult problem of the evaluation of large multi-dimensional integrals is based on a 
technique known as Nested Sampling, which provides an attractive (and possibly superior) alterna- 
tive to more traditional Markov-chain Monte Carlo (MCMC) methods. We discuss the details of the 
implementation of this algorithm and its performance against a Gaussian model of the background 
noise, considering the specific case of the signal produced by the in-spiral of binary systems of black 
holes and/or neutron stars, although the method is completely general and can be applied to other 
classes of sources. We also demonstrate the utility of this approach by introducing a new coherence 
test to distinguish between the presence of a coherent signal of astrophysical origin in the data 
of multiple instruments and the presence of incoherent accidental artefacts, and the effects on the 
estimation of the source parameters as a function of the number of instruments in the network. 
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I. INTRODUCTION 

Searches for gravitational waves are entering a crucial 
stage with the network of ground-based laser interferom- 
eters - LIGO [H 0, Virgo and GEO 600 Q - now 
fully operational and engaged in a new year-long data 
taking period [H, @ at "enhanced" sensitivity, which may 
allow the first direct detection of gravitational radiation. 
Construction has already begun for the upgrade of the 
instruments to advanced configuration (second genera- 
tion interferometers) with installation at the sites that 
will start at the end of 2011 When science ob- 

servations resume at much improved sensitivity by 2015, 
several gravitational-wave events are expected to be ob- 
served, opening a new means to explore a variety of astro- 
physical phenomena (see e.g. Ref. [1, [l(| and references 
therein) . 

Coalescing binary systems of compact objects - black 
holes and neutron stars - will be the workhorse source 
for gravitational wave observations. Ground-based laser 
interferometers will monitor the last seconds to min- 
utes of the coalescence of these systems. The theoret- 
ical modelling of the (in-spiral) waveform is well in hand 
(see e.g. [ll[ and references therein), and the search al- 
gorithms are well understood [lJ-UJl- The detection 



rate for on-going searches and observations with second 
generation instruments is estimated to lie in the range 
9x 10~ 5 yr- 1 -0.7yr" 1 and 0.2 yr" 1 - 1000 yr" 1 , respec- 
tively, see [20| for a review. It is likely that in a few 
years time ground-based laser interferometers will allow 
us to extract a wealth of new information ranging from 
the formation and evolution of binary stars, the nature of 
precursors of (short) gamma-ray bursts, dynamical pro- 
cesses in star clusters, and could yield a new set of stan- 
dard candles for precise cosmography. 

As instruments are beginning to operate at a meaning- 
ful sensitivity from an astrophysical, cosmological and 
fundamental physics point of view, much emphasis is 
now being placed on the development of methods that 
offer the maximum discriminating power to separate dis- 
turbances of instrumental origin from a true astrophys- 
ical signal, and to extract the full range of information 
from the detected signals. Bayesian inference provides a 
powerful approach to both model selection (or hypothe- 
sis testing) and parameter estimation. Despite the con- 
ceptual simplicity of the Bayesian framework, there has 
been only limited use of these methods for ground-based 
gravitational-wave data analysis due to their computa- 
tional burden, in this case related to the need to com- 
pute large multi-dimensional integrals. Additionally, the 
Gaussian likelihood functions considered so far do not 
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address the instrumental glitches which arc present in 
data from the current generation of gravitational wave 
detectors. 

Here we present an efficient method to compute con- 
currently the full set of quantities at the heart of Bayesian 
inference: the Bayes factor between competing hypothe- 
ses and the posterior density functions (PDFs) on the 
relevant model parameters. The method is based on 
the Nested Sampling algorithm (2l| - [23j to perform multi- 
dimensional integrals, that present the practical and 
computationally intensive challenge for the implemen- 
tation of Bayesian methods. We demonstrate the algo- 
rithm by considering multi-detector observations of grav- 
itational waves generated during the in-spiral phase of 
the coalescence of a binary system, modelled using the re- 
stricted post 2 '°-Newtonian stationary phase approxima- 
tion, which is the waveform used so far for searches of 
non-spinning binary objects [77| . Initial results based on 
this method and applied to simplified gravitational wave- 
forms were reported in 0, HE]- An application to the 
study of different waveform approximants to detect and 
estimate the parameters of signals generated through the 
numerical integration of the Einstein's equations for the 
two body problem in the context of the Numerical INJec- 
tion Analysis (NINJA) Project was reported in f26U28j ]. 
In this paper we: 

• Provide for the first time full details about the the- 
oretical and technical issues on which the compu- 
tation of the evidence is based; 

• Discuss the errors associated with the computation 
of the integrals and the associated computational 
costs; 

• Show how from the nested samples one can 
construct at negligible computational cost the 
marginalised posterior PDFs on the source param- 
eters; 

• Demonstrate the performance of this technique in 
detecting a binary in-spiral signal against a Gaus- 
sian model of background noise in coherent obser- 
vations using a network of detectors, and by intro- 
ducing a new coherence test to distinguish between 
the presence of a coherent signal of astrophysical 
origin in the data of multiple instruments and the 
presence of incoherent accidental artefacts, and 

• Show the effects of the number of instruments in the 
network on the estimation of the source parameters. 

The method in its present implementation can be ex- 
tended in a straightforward way to the full coalescence 
waveform of binary systems - in fact the first example 
applied on the NINJA data set is described in [27], IH| 
- and the software is available as part of the LSC Anal- 
ysis Library Applications (LALapps) [29|, Ho[. Work is 
already on-going in this direction. Furthermore, the ap- 
proach can also be extended to other gravitational-wave 
signals. 



The paper is organised as follows: in Section [TT| we re- 
view the key concepts of Bayesian inference and the signal 
generated by in-spiralling gravitational wave signals; in 
Section ITU we describe our implementation of the numer- 
ical computation of multi-dimensional integrals for the 
computation of the marginal likelihood and marginalised 
posterior density functions based on nested-sampling the 
limited likelihood function; Section IIVI contains the im- 
plementation details of the algorithm, including a quan- 
tification of the errors that affect the results as a function 
of the choice of the main tuning parameters of the algo- 
rithm and the effects on the scaling of the computational 
costs. Section [V] contains the results from the applica- 
tions of this method to several test cases, including a 
Bayesian coherence test that we introduce here for the 
first time. 

Throughout the paper we use geometric units, in which 
G = c= 1. 



II. BAYESIAN INFERENCE FOR BINARY 
IN-SPIRALS 

Statistical inference can be roughly divided into two 
problems: (i) Model selection, or hypothesis testing, be- 
tween competing hypotheses through the computation of 
the evidences (or marginal likelihoods) of models, and 
(ii) parameter estimation (of the unknown parameters 
on which a model depends) . In the context of Bayesian 
inference, both aspects are simply tackled through an 
application of Bayes' theorem and the standard rules of 
probability theory. 

While this approach is mathematically straightfor- 
ward, its implementation is hampered by the need to 
explore large parameters spaces and perform what are 
in general computationally costly, high-dimensional inte- 
grals. For gravitational waves generated by binaries with 
negligible spins and eccentricity - the case considered in 
this paper - the number of dimensions of parameter space 
is 9, and for a generic binary system (described by gen- 
eral relativity) the total number of parameters increases 
up to 17. This technical aspect is one of the main factors 
that has limited the application of a Bayesian approach 
in a number of problems. 

Model selection has been tackled through a number of 
techniques, including Reversible Jump MCMC [3l| and 
thermodynamic integration (32j . Parameter estimation is 
usually dealt with using MCMC methods [HI, [34| , which 
may include advanced techniques such as parallel tem- 
pering [H, [3(| and delayed rejection [37l - l4fjj to enhance 
the exploration of the parameter space. For applications 
of these and other Bayesian methods in ground-based 
gravitational- wave observations, see e.g. [2a, [28l l4l| - l52l [ . 

Nested Sampling [2"D-|23j is a powerful numerical tech- 
nique to deal with multi-dimensional integrals. It dif- 
fers from other Monte Carlo techniques such as Markov- 
Chain Montc-Carlo (MCMC) methods [H, |H[ that arc 
popular in applications of Bayesian inference, in that it 
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is specifically designed to estimate the evidence integral 
itself, see Eq. (QJ, with the marginalised posterior PDFs 
being optional by-products. 

In this section we outline the key concepts of Bayesian 
inference and review the signal model - in-spiral signals 
generated by non-spinning binary systems in circular or- 
bits - on which we concentrate the development and ap- 
plication of the algorithm. Many of the technical imple- 
mentation aspects are associated to sampling effectively 
the likelihood function in the sky position parameters, 
and are therefore completely general for any application 
to short-lived bursts characterised by an arbitrary wave- 
form. 



A. Model selection 

In the formalism of Bayesian inference, the probability 
of a model or hypothesis "Hi, given a set of observational 
data d and prior information / is given by Bayes' theo- 



P(Hi\d, I) 



P(Hi\I)P(d\Hi,I) 
P(d\I) 



(1) 



In this expression, P(%,|J) is the prior probability of %i, 
P(d\'Hi, I) is the likelihood function of the data, given 
that Jii is true, and 



P{d\I) = Y,P{d\HuI) 



is the marginal probability of the data set d, which can 
only be calculated if there exists a complete set of inde- 
pendent hypotheses such that Ylj P(Hj\d, I) = 1. Here 
we do not enumerate such a set of models, but we can 
still make comparisons between the models we do have 
by calculating the relative probabilities in the form of the 
posterior odds ratio Oij between two of them, 



Oi 



P(H Z \I) P{d\Hj,I) = P{Ui\I) 

P(Hj\i) p{d\n h i) P{Uj\i)- 



Bi 



(2) 



in the previous equation the normalisation factor P{d\I) 
cancels out, and 



Bij = 



_ P(d\Ui,I) 



p(d\n 3 ,i) 



(3) 



is known as the Bayes Factor or ratio of likelihoods. 

The Bayes factors can be directly found for hypothe- 
ses which have no free parameters, but the gravitational 
wave signal we are modelling depends on a set of pa- 
rameters, 8 £ Q, described in Section III CI where G is 
the parameter space. In this case, the likelihood of the 
model % must be marginalised over all the parameters 



weighted by their prior probability distribution, giving 
the marginal likelihood or evidence, 

Z = P(d\H, I)= I P (6\n, I)p(d\H, 6, T)dd, (4) 
Je 

where p{8\H,I) is the prior probability distribution over 
the parameter space. 

The integral Q cannot be computed analytically in 
all but the most trivial cases, and standard grid-based 
numerical approaches can take a prohibitively long time 
to complete when the model is high-dimensional and/or 
very large with respect to the posterior as is the case 
for gravitational-wave observations. By using the nested 
sampling algorithm, developed by Skilling |21| . we have 
been able to solve the problem of calculating this inte- 
gral - and as a consequence the desired Bayes factors 
and the marginalised posterior density functions of the 
unknown model parameters 8 - in a time that makes 
Bayesian techniques applicable in actual gravitational- 
wave search pipelines. Section Mil provides an overview 
of the algorithm, and the implementation strategy that 
we have adopted. Further implementation details, as well 
as the characterisation of its accuracy in the evaluation 
of the evidence integral as a function of CPU time are 
discussed in Section [TV] 



B. Parameter estimation 

In general the hypotheses depend on a set of unknown 
parameters 8 £ O. As part of the inference process, 
one wants also to compute the posterior density function 
(PDF) 



p(8\d,n,I) = 



p{e\n,i)p{d\8,n,i) 
p(d\n,i) 



(5) 



of the parameters, in this specific case of binary systems 
quantities such as the masses, position in the sky and 
distance. 

The marginalised PDF on a subset 8a of the param- 
eters - our notation is 8 = {8a, 8b}, 8a, b £ @a,b _ is 
defined as 



p(8 A \d,H,I)= [ P (d\d,H,I)d8 B 



(6) 



From p(8A\d, H, I) it is then straightforward to compute 
e.g. the posterior mean 



8 A p(8 A \d,H,I)d8 A 



(7) 



We will show in Section IIIIBI that the Nested Sampling 
algorithm provides a way of computing the marginalised 
posterior PDFs, Eq. (JS|), with totally negligible addi- 
tional computational costs from the results of the numer- 
ical evaluation of the evidence, Eq. In this respect, 
Nested Sampling may provide advantages with respect to 
more traditional MCMC algorithms. 
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C. Target waveform 

In this paper we consider observations of gravitational 
waves from a network (of an arbitrary number) of in- 
terferometers. The datum (in the frequency domain) at 
frequency / from each detector that we label with D is: 



(8) 



where h^ D \f) and n^ D \f) are the gravitational wave 
signal and noise contribution, respectively. In Section 
tVl we will consider specific choices of the interferometer 
network, and we will label with D = H, L,V the LIGO 
Hanford 4-km arm instrument, the LIGO Livingston in- 
terferometer and Virgo, respectively. In practice, one 
works with discrete data, and we will refer to (and 

analogously ftl and n[ for the signal and noise, re- 
spectively) as the data point at discrete frequency of 
the instrument D. As short hand notation, we will use d 
to identify the whole data set from the relevant network 
of instruments, and to to the data set from a single 
detector D, so that 



d = {S H \S L \....}. 



(9) 



Let us consider a (geocentric) reference frame, and a 
gravitational wave source described by the two polarisa- 
tion amplitudes h + (j) and h x (f) located in the sky at 
(a, 5), where a is the right ascension and S the declina- 
tion of the source. The signal as measured at the output 
of the detector D is therefore 



U D \f) 



Fi D) h+(f)+F^'h x {f) 



-2irifAt (D) 



(D), 



(10) 

where (tp, a, 6; t ) and (-0, a, 5; t ) arc the detec- 
tor response functions to each polarisation, dependent on 
the polarisation angle tp (see e.g. Appendix B of [5(| for 
the definition conventions) , and the time of observation 
to. These are computed using functions available in the 
LSC Algorithm Library [29|. Given a source at location 
(a, 6), At( D '(a,5;to) is the difference in gravitational- 
wave arrival time between the geocentre and the detec- 
tor D, computed with respect to a reference time to that 
identifies the observation, see the text after Eq. ([lUf be- 
low for our specific choice of to- At^ D ' depends on the 
time of observation, as for a fixed position in the sky, 
the signal impinges on the instruments with different rel- 
ative time delays due to the Earth's rotation. By using 
the transformation (|10[) . the waveform phase which is the 
most expensive part of the model, needs only to be cal- 
culated once, and is then transformed to the observed 
signal in each detector. 

In this paper, we concentrate on the in-spiral sig- 
nal generated during the coalescence of a binary sys- 
tem of compact objects (black holes or neutron stars) 
of masses mi and m,2. Other mass parameters that we 
will use are the total mass M = mi + m,2, the sym- 
metric mass ratio r\ = mi 7712/ (mi + 7712) 2 and the chirp 



mass M. = (mim2) 3 / 5 /( m i + m^) 1 / 5 . We assume cir- 
cular orbits and we further restrict to compact objects 
that are non-spinning. We note however, that (an ear- 
lier implementation of) the approach discussed here was 
already successfully applied to the case of the full coales- 
cence waveform generated by compact binaries p7l |28| . 
Moreover, most of the results presented in this paper are 
totally general and independent of the specific waveform 
model, and can be applied and/or extended to any class 
of signals. 

The model for the gravitational-wave signal that we 
consider is the frequency domain, stationary phase, 
post 2 '°-Newtonian approximation to the waveform, and 
more precisely the so-called "TaylorF2" approximant - 
for an up-to-date summary of the different TaylorF/T 
approximants we refer the reader to 55| and references 
therein. The waveform is therefore described by two in- 
trinsic parameters, the two masses or any two indepen- 
dent combinations of them, such as M. and r\. We note 
that the specific choice of the post-Newtonian order is 
irrelevant for the issues discussed in this paper, as long 
as the waveform model used to construct the likelihood 
function matches the one adopted in the "injections" [78j 
to generate synthetic data sets to explore the algo- 
rithm. As a consequence the results presented here would 
be essentially identical if we had adopted the post 3 5 - 
Newtonian order which is currently used in the analy- 
sis of the LIGO/Virgo data for the current science run 
(S6/VSR2). We generate the waveform directly in the 
frequency domain using functions of the LSC Analysis 
Library (LAL) [29[ . The frequency domain gravitational- 
wave polarisation amplitudes are given by 



h + (f) = A(l + cos 
h x (f) = 2A cost/ 



0/ 



-7/6.i*(/) 



-7/6 e t*(/)-»7r/2 



(11) 

(12) 



Here, the symbol t denotes the inclination angle, defined 
as the angle between the line of sight to the source from 
the detector and the constant direction (as the objects 
are assumed to be non-spinning) of the orbital angular 
momentum. The gravitational- wave phase \&(/) at the 
post 2 -Newtonian order is given by 



$>(M,r),to,(t>o-J) =27r/t -^o 



(k-5)/3 



fe=0 



(13) 



where i^n and tpk are the standard Newtonian and post- 
Newtonian coefficients, whose expressions can be found 
in e.g. Ref. [EtJ- In our implementation, to is taken as 
the GPS time at the geocentre at which the frequency 
of the gravitational wave passes that of the nominal in- 
nermost stable circular orbit, /isco = (6 3 / 2 7rM) _1 , and 
consequently 4>o is the phase of the signal at this time. 
The amplitude of the gravitational wave A cx M 5/6 /D L 
and is computed by the LAL Stationary Phase Approx- 
imation Template [29(. In summary, the observed signal 
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is therefore dependent on nine quantities, which for con- 
venience we will write as the parameter vector 

9 = {M,r),t ,(j)o,D L ,a,6,>ip,i}. (14) 

Finally we discuss the assumptions on the noise wS D >. 
We will make the standard assumption that the noise is 
a Gaussian and stationary process with zero mean and 
variance described through the one-sided noise spectral 
density si D \f): 

{n(f) iD) ) = 0, (15) 
(^)(/)n^)*(/')) = \sU-nSU), (16) 

where (.) stands for the ensamble average. Under these 
assumptions, the likelihood of a given noise realisation 
mS D > = ?io is simply given by the multivariate Gaussian 
distribution 

p(n (fl) =no)«e- (,i ° w/2 , (17) 

where (.|.) stands for the usual inner product (59|, see 
Eq. (1A17|) of the Appendix. 

We will further assume that the noise in different de- 
tectors is uncorrelated, so that we generalise Eq. (|16|) to 

(« (fl, (/)S (flT (/)) = ls(f-f')S DD ,S^(f). (18) 

The latter assumption is appropriate for sites that are 
well isolated from each other, but this may not be true for 
the two instruments co-located at the Hanford site. Here 
we only consider a simulated network with no more than 
one instrument at any location. In terms of the elements 
fik of the discrete Fourier series of the discretely-sampled 
time domain data, with sampling interval At and segment 
length T, this is given as M 2 (\n k \ 2 ) = ^S(f k ). Full de- 
tails of the conventions used for discretely-sampled data 
are given in Appendix [X] 



D. Models 

The problem of assessing the confidence of detection 
of a signal in interferometer data is the primary mo- 
tivation for the Nested Sampling technique and imple- 
mentation we present here. Translated into the Bayesian 
framework, assessing the confidence of detection means 
computing the Bayes factor between two hypotheses, and 
therefore we must specify exactly which models we are 
comparing. These models are the mathematical descrip- 
tions of the data d, Eqs. ([8]) and ((9|), which either contain 
a gravitational- wave signal h(9) paramctcrised by a cer- 
tain vector 9 (described in section uTC]) . or it does not. 
In addition, the data also contain a contribution from 
instrumental noise, described by Eqs. ([ToT - (TT8"|) . The two 
models we will use can be written as: 



• Hn- the noise-only model, that corresponds to the 
hypothesis that there is only noise (with statistical 
properties described in Section iHCj) present in the 
data set: 

d = ri; (19) 

note that in our application we assume that the 
noise spectral density is known (79| and this model 
has therefore no free parameters; 

• Hs- the signal model, that corresponds to the hy- 
pothesis that the data contain noise (as before) 
and a gravitational-wave described by the wave- 
form family h(9): 

d = n + h(9). (20) 

Although in reality, there is also a wide range of instru- 
mental glitches and artefacts which alters the evidence of 
each model in a variety of ways, initially we focus on char- 
acterising the algorithm with simulated data. Strategies 
for distinguishing between a coherent signal and other 
artefacts are discussed in Section IV C 11 

The computation of the marginal likelihood Eq. (HJ) 
and Bayes factor, Eq. ^ requires the integration of the 
likelihood function, p(d\9,W, I), where H is either Hn or 
Jis, multiplied by the prior density function of the un- 
known parameters for the given hypothesis. We discuss 
the choice of prior in Section IV Al Here we concentrate 
on the expression of the likelihood function. In the case 
of the hypothesis % n , the likelihood function is simply 

p(d^\9,n N ,I)o,e^ d(D ^ d(Dy ^\ (21) 
see Eq. p7|) . 

For the hypothesis Hs, the likelihood of observing a 
data set d^ at the output of the instrument D given the 
presence of a gravitational wave hS D > (9) characterised by 
the parameter vector 9 is 

p(d^\e,H s J) oce-^'-^l^-* 10 ')^ (22) 

The constant of proportionality is equal in equations 1221 
and[5TJ an d cancels when the ratio of these quantities is 
taken. If we have a data set comprising observations 
from multiple interferometers, say d = {dr 1 , d^ , . . .}, 
the Bayesian framework allows straightforward coherent 
analysis. To do this, we simply write the joint likelihood 
of the independent datasets in all the detectors 

p(d&H t I) = l[p(S D ^\9,n,I), (23) 

(D) 

where p{d^ D) \9,U,I) is either given by Eq. ([17]) or (j22|) . 
In Appendix [X] we provide explicit expressions for the 
likelihood function (|2"3"]l in the case of discrete data used 
for the implementation in the software code. 
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III. THE NESTED SAMPLING ALGORITHM 



evidence, 



The nested sampling algorithm is described by 
Skilling [2l[ as a reversal of the usual approach to 
Bayesian inference, in that it directly targets the com- 
putation of the evidence integral (j4]), producing samples 
from the posterior PDF, Eq. (|5|) of the model parameters 
9 as a by-product. Although the original formulation was 
designed as a tool for Bayesian inference, it is actually a 
general method of numerical integration which could be 
applied to other continuous integrals. The basic algo- 
rithm, described in [2l[ , is therefore applicable to a wide 
range of problems, but in its generality it leaves consid- 
erable decisions to be made on the implementation, con- 
figuration and tuning to each specific application. In this 
section, we will review the core algorithm, Section IIII Al 
and the processing of the output of the algorithm to ex- 
tract samples from the posterior PDF which can then 
be used for parameter estimation, Section IlII Bl In Sec- 
tion |IV] we provide detailed information on our solution 
of the problem of sampling the limited prior distribution, 
Section IIV A[ and we will also examine the theoretical 
accuracy achievable with the algorithm. This result will 
then be compared to the practical accuracy achieved and 
its trade-off with computational cost in Section IIV Bl 

More recently, MultiNest [6(| HH , based on the tech- 
nique of Nested Sampling, has been applied to data sets 
primarily in the context of cosmology [62T - [o4( and par- 
ticle physics [65l - l70| . More recently it has been used 
on selected mock data sets of the Laser Interferometer 
Space Antenna to search for and estimate the parame- 
ters of massive black hole binary systems characterised 
by high ~ 1000 signal-to- noise ratio [7l|. Our implemen- 
tation is fundamentally different from MultiNest in the 
way in which the structure of the likelihood function is 
explored, and in particular it replaces the clustering algo- 
rithm or ellipsoidal rejection schemes with (non-trivial) 
MCMC explorations of the prior range, that are specif- 
ically tailored to the observation of gravitational-waves 
with a network of ground-based instruments at moderate- 
to-low ( 20) signal-to-noise ratio. Moreover, given the 
large amount of data and the need to maximise com- 
putational efficiency while retaining the accuracy of the 
evaluation of the relevant integrals, our study concen- 
trates on quantifying the errors in the evaluation of the 
key quantities and relating them to the computational 
costs. 



A. Computing the evidence integral 

The evidence Z = P(d\T-L,I), given in equation 0] is 
found by integrating the product of the prior distribution 
with the likelihood function, in other words, Z is the 
expectation of the likelihood with respect to the prior. 
Using the product rule we can easily see the relationship 
between the prior, likelihood, posterior PDFs and the 



p(e\n,i) x P {d\e,n,i) 

Prior x Likelihood 



Z xp(9\d,n,I) (24) 
Evidence x Posterior. 



As the prior and posterior are by definition normalised, 
the magnitude of the evidence is governed by the like- 
lihood function, which provides a measure of how well 
the data fits the hypothesis T~L. In order to evaluate Z 
one must sum the product on the left side of equation 
IM1 at each point 9 £ O in parameter space. In our case 
the parameter space is a continuous manifold, and the 
likelihood function is a smooth function on the manifold 

which we integrate. When this integral is not solv- 
able using analytic methods, we must approximate it by 
using a subset of points on 0, for instance by placing a 
lattice on parameter space. Once such an approximation 
to a finite number of points is made, the result becomes 
subject to an integration error which is dependent on the 
precise means of integration. 

Instead of a regular lattice of points, consider a 
stochastic sampling of the prior distribution to gener- 
ate a basket of TV samples - in the nested sampling jar- 
gon called live points - which we will denote 9i, with 

1 = 1...N. The evidence integral (equation [3J could 
then be expressed as 



Z 



P (O\H,I)p(d\H,0,I)dO. 



Y^p(d]9i,H,I)i 

i=l 
N 



where the "weight" 



Wl = P (6i\H,I)d0 



(25) 



(26) 



is the fraction of the prior distribution represented by 
the i-th sample, and Li = p(d\W,9i, I) is its likelihood. 
In the presence of a signal, the evidence integral is typ- 
ically dominated by a small region of the prior where 
the likelihood is high, concentrated in a fraction e~ H 
of parameter space. H is called the information in the 
data, subject to the particular model and parameterisa- 
tion used, and is measured in nats (using base 2 instead 
of base e would give information measured in bits, where 
1 nat = log 2 ebits ~ 1.44 bits). H is defined as 



h= / P (e\d,n, i) log 



P {9\d,n,I)d9 
dX 



d6, 



(27) 



and will be used in section IIV Bl to quantify the accu- 
racy [2l|; X is the prior mass, and is defined in Eq. (f2"5)l 
below. If it is not known in advance where in parame- 
ter space the posterior is concentrated, approximately e H 
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Parameter space X^ X ? X^ 1 



FIG. 1: Each sample in the basket of live points can be 
thought of as lying on a contour line of equal likelihood value. 
Figure reproduced from [2l| . 

points would be needed to avoid the possibility of missing 
the maximum of the likelihood function using a regular 
grid. In the case of a compact binary in-spiral signal 
as observed in a network of ground-based interferome- 
ters and using the parameterisation given in section lll CI 
H > 20nats. If a regular grid of points was used (assum- 
ing a uniform prior) , finding the weights associated with 
each point wi = 1/N would be simple, but the number of 
samples needed, N becomes prohibitively large. The key 
concept on which the nested sampling algorithm rests is 
the means to calculate the Wi of stochastically sampled 
points. By evolving the collection of N points to higher 
likelihood areas of parameter space, the algorithm simul- 
taneously searches for the peaks of the distribution and 
accumulates the evidence integral as it progresses. 

In order to find the weights associated with each point 
9i, it is useful to think of each point as lying on a (not 
necessarily closed) contour surface of equal likelihood in 
the parameter space. The prior mass - that is the frac- 
tion of the total prior volume - enclosed by the i-th con- 
tour surface is denoted X t , with the lowest likelihood 
contour line enclosing the largest volume and the maxi- 
mum likelihood point enclosing the smallest. With this 
definition, Xq = 1 . We can then think of a mapping be- 
tween the contour lines in physical parameter space and 
the fractions of the prior Xi, where the likelihood L(X) 
increases toward smaller values of X , as shown in figure 
[TJ and AX; = X i+1 - X t . The evidence, Eq. (gj) or (gSJ 
can then be expressed as the one-dimensional integral 

Z = J L{X)dX » Y^ L ( X d AX i- ( 28 ) 

z 

As the inverse mapping 9(X) is not known, the analyti- 
cal integral cannot be performed. However, as we know 
that the prior distribution is normalised to unity, the 
unknown prior mass enclosed by the outermost contour 
through Xi has a probability distribution P{X{) which is 
equal to the distribution of a new variable t\ £ [0, 1], the 
maximum of N random numbers drawn from the uniform 
distribution [7(0,1). If we then replace the first point 
with a new point sampled from the prior distribution 



limited to the volume lying at higher likelihood than L\, 
X(L > Li), we can repeat the process so that X2 = tiX\ 
and Xi = tiXi—i, where by definition U = Xi/Xi-i is the 
shrinkage ratio. The probability of U is P{U) = Nt^ 1 , 
where U is the largest of N random numbers drawn from 
U(0, 1). The volume enclosed at each iteration therefore 
shrinks geometrically, ensuring the speedy convergence 
of the integral. The mean decrease in the volume at each 
iteration is 

E [log t}= [ log (t) p(t) dt=-N~ 1 , (29) 
Jo 

and an estimate of the statistical variance introduced by 
this process is 

/ (\ogt~E{\ogt]) 2 p(t)dt = N- 2 . (30) 
Jo 

The distribution of t can also be sampled by generating 
the N uniform random numbers and creating many reali- 
sations of the is for each iteration in the algorithm. In our 
testing with this procedure, it was found that for a rea- 
sonable number of realisations, the estimated mean and 
variance were very close to the expected figures. Using 
the approximation of the mean, we can therefore write 
the fractional prior volumes 

logX, a - (i±Vi)/N, (31) 

and we use this approximation in the implementation, 
where we work with logarithmic quantities to overcome 
the huge range of the variables. 

As we now have an approximation to the proportion of 
the prior mass remaining after the i-th iteration, we can 
assign a weight to each sample as Wi = Xi — Xi—\. 

As all structure of the likelihood function in the pa- 
rameter space is eliminated by the mapping to X, the 
nested sampling algorithm is in principle robust against 
multi-modal distributions, degeneracies and problems 
arising from the high dimcnsionalality of the parameter 
space. However, this relies on being able to sample the 
likelihood-limited prior distribution effectively. This dif- 
ficult problem can be solved in a variety of ways, but the 
approach which we have found effective is described in 
section HVA"1 

Note that if the bulk of the posterior is concentrated 
in a region of size e~ H of the prior, it will take approxi- 
mately NH± V NH iterations of the geometric shrinkage 
to reach the zone of high likelihood. This tells us that 
the algorithm will take longer to compute the integral 
if there is a larger amount of information in the data. 
This means the run time of the implementation is essen- 
tially dependent on the signal to noise ratio of any found 
signals, as well as on the number of live points N used. 

Finally, we need to specify a termination condition, 
upon which we decide that the integral is finished. We 
could set a hard number for this, or a certain fraction of 
the prior, but the total number of points needed varies 
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FIG. 2: An illustrative plot of the log likelihood against the 
fraction of the prior Xi, generated as the algorithm progresses, 
to be compared with figure [T] Reading this from right to 
left, as the fraction of the prior enclosed by the contour line 
decreases, the likelihood increases as the algorithm proceeds. 
The individual details of the distribution are smoothed out 
by the projection onto the X parameter. The bulk of the 
probability from the signal occurs at around 1CF 11 = e -25,3 , 
in good agreement with the estimated information content of 
~ 25.7 nats. The inset shows the slightly different results 
gathered by running the algorithm 10 times with a different 
random seed, where the precise samples used to integrate are 
different. 

a great deal, particularly with the signal to noise ratio 
of the signal, if any. In our implementation, we keep 
track of the maximum likelihood point so far discovered. 
The algorithm will keep running until the total evidence 
that would be left if all the remaining points lay at the 
maximum likelihood so far discovered becomes less than a 
certain fraction of the total evidence so far accumulated. 
Based upon experience, we have found that continuing 
while L max Wi > Zie~ 5 gives consistent results. 

To summarise, the algorithm can be described in 
pseudo-code (where 9 ~ p{9\H, 1) means 9 is drawn from 
the distribution p(9\H,I)) as: 

1. Draw N points a , a e 1 . . . N from prior p{0), and 
calculate their L a 's. 

2. Set Z = 0, i = 0, logw = 

3. While L max Wi > Zie~ 5 

(a) i = i +1 

(b) L min = min({i a }) 

(c) \0gWi = loglUi-i - A^ 1 

(d) Zi = Zi-i + L min Wi 

(e) Replace mln with 9 ~ p(6\H,I) : L0) > L min 

4. Add the remaining points: For all a £ 1 . . . N , Z t = 

Zi + L(9 a )wi 

With the algorithm as it is outlined above, the crucial 
idea is the sorting of the likelihoods so that progressively 
smaller contour-lines can be assigned to each of them, 



and the integral (|25[) builts up. This leads us to a natu- 
ral means of parallelising the algorithm, so that we may 
take advantage of multiple processors or compute nodes 
on a cluster. If the algorithm is run in parallel with iden- 
tical data and parameters, but a different random seed 
(this requires no inter- node communication), the sets of 
samples generated will differ. If we save each sample and 
its likelihood value, we can then collate the results of mul- 
tiple runs, and sort the resulting samples by their likeli- 
hood values. So long as the number of parallel runs N Tuns 
remains constant as the integration progresses, each sub- 
sequent sample from the limited prior distribution can be 
treated as being part of a collection of Nt = J2k=i" ^ k 
samples - where each parallel run has Nk live points - as 
we no longer know which sample belongs to which run. 
This then allows us to re-apply the nested sampling al- 
gorithm as described above, but with a lower weight for 
each sample, substituting for N the number Nt- 

After applying this procedure, a more accurate esti- 
mate of the evidence integral can be obtained, using our 
greater number of total live points. This procedure also 
increases the accuracy of the evaluation of the posterior 
PDFs that wc discuss in the next Section. It was found 
that this procedure allows the accuracy to scale with the 
total number of parallel live points as shown in Figure [H 
provided that each run has a sufficiently large number 
of live points to avoid under-sampling of the parameter 
space. The issue of increasing accuracy at the expense of 
additional run-time is discussed further in Section [IV Bl 

B. Extracting the posterior PDF 

As the nested sampling algorithm proceeds, the list of 
points used in approximating the integral is stored, along 
with the likelihood values of each sample, the correspond- 
ing value of the parameter vector, and logX; « i/N. 
These samples are drawn from the prior distribution, lim- 
ited by a likelihood contour to a fraction Xi of the full 
prior, meaning that the density of the samples is boosted 
within the contour by the probability that is excluded, as 
it is zero outside the contour. We can therefore write in 
short-hand the probability density of the z-th point from 
the nested sampling output as 

^|NS) = «^, (32) 

whereas samples from the posterior PDF, Eq. © have 
probability density 

P (ei\d,H,i) ^ P {9i\n,i)p{0 h nj). (33) 

Since the nested sampling points are independent sam- 
ples, they can be re-used to generate samples from 
the posterior PDF by re-sampling them. Substituting 
Eq. (|32p into Eq. ([33]). it is easy to see that the proba- 
bilities are related by, 

p0i\d,H,T) o:p(9 i \NS)p{d\0 i ,n,I)X i , (34) 



9 



and so the resampling weight of each one is oc 
p{d\6i,%, I)Xi. As a consequence, the joint posterior 
PDF can be easily calculated by post-processing the out- 
put of the nested sampling algorithm (at negligible com- 
putational cost). Marginalised posterior PDFs, Eq. © 
can then be obtained as in the case of MCMC methods, 
by histogramming the samples. In this way we can eas- 
ily perform both evidence integrals and estimation of the 
posterior PDF, making both model selection and param- 
eter estimation possible. It is important to note that 
the method of extracting posterior samples using nested 
sampling is different to that in standard MCMC algo- 
rithms, as the algorithm is designed to move the ensem- 
ble of points uphill from a sampling of the entire prior 
toward the highest likelihood point, whereas MCMC can 
also move downhill (with probability < 1) and requires 
sufficient burn-in time to fully explore the full range of 
the prior. If the location of the true maximum is not 
known, nested sampling offers the ability to home in on 
the true location with its geometric shrinkage of the sam- 
pling volume, making it an excellent tool for searching 
the parameter space for maxima. 

On the other hand, the number of posterior samples 
generated by the nested sampling algorithm is limited by 
the number of live points used, with more live points cor- 
responding to more samples within the uppermost con- 
tour lines. In order to get the posterior sampling desired, 
it might be necessary to increase N, or run parallel com- 
putations, which has the effect of causing the algorithm 
to converge more slowly (but also more accurately, see 
Section MM- 



IV. IMPLEMENTATION DETAILS 

In the previous section we have described the concep- 
tual approach to the computation of the evidence integral 
using a nested sampling technique; furthermore, at the 
end of Scction llll A[ we have provided a pseudo-code with 
the key steps of the algorithm. One of the key challenges 
in the efficient implementation of the algorithm is to re- 
place at each iteration the active point characterised by 
the minimum value of the likelihood function L m \ n , draw- 
ing a sample from the prior distribution limited to the 
volume that satisfy the condition L > L m [ n - We do so 
by means of a Metropolis-Hastings Markov chain Monte 
Carlo with M steps that proceeds as follows. We ran- 
domly select one of the N live points, corresponding to 
say 9, that we assume as the starting point of the Markov 
chain, the "current state" . We then propose a new state 
9' drawn from a proposal distribution (also called tran- 
sition kernel) q(8,6'), i.e. the probability of 9' given 9. 
The new state is accepted with probability 



a H ( 




P (9)q(6,6>) 



L{9<) > L s 
L{9') < U 



(35) 



and equivalently the chain remains at 9 with probability 
1 — an(8,9'). We continue to evolve the chain to accu- 
mulate M states, and the last one is set to be the new 
live point that replace that characterised by L = L m i n . 
If no points have been accepted during the M propos- 
als, then the chain will have remained at the pre-existing 
point, and so we must re- run the chain using a different 
live point as an initial state. 

In Section IIV Al we describe the details of the explo- 
ration of the limited prior, that is the choice of q(0,6'); 
in Section IIV Bl wc quantify how the errors in the ev- 
idence evaluation scale with the number of live points 
and MCMC elements, N and M, respectively, and how 
they are related to the CPU processing time. 



A. Sampling the limited prior 

In order to produce a new live point for each itera- 
tion of the nested sampling algorithm, it is necessary to 
draw a sample from the prior distribution, limited to vol- 
umes with likelihood greater than L mm . This distribution 
changes from the entire prior distribution at the zcroth 
iteration, to a tiny fraction, typically < I0~ 10 , of the pa- 
rameter space when the posterior mode has been located, 
see e.g. Figure [2] In between these extremes, as £ m in in- 
creases, it will cause "islands" of probability to separate 
from each other and disappear, as if being submerged 
by a rising tide. There may also be multiple maxima 
of similar likelihood values, and these modes are gener- 
ally curved or "banana-shaped" in the multidimensional 
volume, an example of which is shown in Figured] 

Maintaining an accurate and efficient sampling of all 
these islands is the biggest challenge in implementing 
nested sampling, as it is in other Monte Carlo methods 
such as MCMC's. The character of these islands and 
their shapes will vary from problem to problem, but here 
we have attempted to proceed in a general way, through 
the use of a semi-adaptive Markov chain Monte Carlo al- 
gorithm to sample the prior, where the number of itera- 
tions of the chain M can be specified. This approach was 
augmented with custom proposals, based on some sim- 
ple intuition on the structure of the likelihood function, 
which were found to improve the efficiency of sampling, 
or alternatively the speed of chain mixing, and will be 
described below in section HV A II 

By using an MCMC sampling of the prior, we have to 
choose a proposal distribution q{9, 9') which will give a 
decent acceptance ratio at all stages of the nested sam- 
pling. As the scale of the problem varies by ten or more 
orders of magnitude, this is impossible to achieve with a 
static choice of proposals. However, we have additional 
information available, in the form of the location of the 
N live points, which can help us select proposal distribu- 
tions dynamically. 

As the collection of live points shrinks at each iteration, 
we can obtain an estimate of the size and orientation of 
the area we need to sample by computing the covariance 



10 



matrix C of the collection of live points, 

Crj = m - (OiWi - (0 3 -») , (36) 

where the indices i,j denote the dimensions (9 in this 
specific case) of the parameter space, and (.) should be in- 
terpreted as the sample mean over the active live points. 
In the case of the cyclical angular parameters (f>o and a, 
we need to take into account the wrapping of the bound- 
ary, and set the covariance between these and the other 
parameters to zero. The variances C^ O O and C aa are 
then computed using the circular mean and circular dif- 
ference in place of the mean and difference in the above 
expression [80| . The matrix dj is re-computed at regular 
intervals (in practice, once every TV/10 iterations) as the 
shrinking proceeds, scaled by a factor of 0.1, and used in 
the sampling of a multivariate student distribution, with 
mean centred on the previous point, and two degrees of 
freedom, when evolving the prior samples. In order to 
do this, a vector v is drawn from the multivariate normal 
distribution with mean zero and covariance matrix given 
by the expression (|36[) , then multiplied by a factor Uljx, 
where x is drawn from a % 2 distribution with two degrees 
of freedom to yield a multivariate student deviate. The 
vector v is then added to 9 to yield the new proposed 
point 6' = 6 + v. 

The advantages of this somewhat ad-hoc method are 
that it is fast to compute, and is applicable whenever 
the number of live points is greater than the dimension- 
ality of the problem to avoid a singular matrix (which 
in practice is always the case). As a consequence, it can 
be used without modification if one wants to examine a 
model with additional parameters, or with some of the 
parameters constrained. It also adapts to the shrinking 
volume of the limited prior distribution as the algorithm 
progresses. This type of proposal is used for the major- 
ity of the jumps used in the sampler. However, there 
are certain types of proposals that we use specifically for 
short-lived bursts (such as gravitational waves generated 
by coalescing binaries) specifically discussed in this paper 
that are designed to move along the degeneracies of the 
distribution. 



1. Custom proposals for sky position 

The coherent analysis of data from interferometers at 
different locations on the Earth, as it is the case for 
the LIGO- Virgo network, offers a number of advantages 
that we will discuss in more detail in the next Section. 
Particularly relevant for the implementation details dis- 
cussed here is the fact that multiple instruments allow 
us to reconstruct partial or full information (depend- 
ing on the number of instruments in the network and 
their location/orientation) about the source position in 
the sky. Such information is encoded in the structure 
of the likelihood function, and its functional dependency 
on (to, a, 5). However, distinctive features in the dis- 
tribution also provide a challenge in exploring them in 
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FIG. 3: The structure of the likelihood function can clearly 
be seen in the samples from the posterior PDF. These figures 
show the distributions obtained when using the Livingston- 
Virgo (x), Hanford- Livingston (x), Hanford- Virgo (x) and 
Hanford-Livingston- Virgo (x) networks of detectors to anal- 
yse the same stretch of simulated data. The true position 
of the injection is marked *. With two detectors, the dis- 
tribution lies on the circle produced by keeping the time of 
arrival constant in both instruments. With a network of three 
detectors, the timing can be kept constant by reflecting the 
position of the source across the plane of the three detectors. 
This causes two local maxima at the two places where the 
three circles intersect, with one of these being the true loca- 
tion. 



a accurate, yet efficient way. The relative timing offset 
between the arrival time of a gravitational wave at mul- 
tiple sites encodes the majority of the information about 
location in the sky (of course additional information is 
contained in the antenna beam patterns F + _ x ). More 
specifically, given two detectors there exists a locus of 
points in the sky that traces a circle about the baseline 
between the two sites that yield the same time-delay at 
the two detectors. If one adds a third site, the three cir- 
cles intersect in two points: one corresponds to the actual 
source position in the sky, whereas the other represents a 
"mirror image" that is located on the opposite side across 
the plane that passes through the three sites. The likeli- 
hood function therefore exhibits certain degeneracies or 
near-degeneracies in the (to, a, S) subspace of parameter 
space. This results in distributions which trace out arcs 
of a circle on the sky, with modulation in to 7 as shown in 
Figure |3] 

In the exploration of the parameter space one can 
therefore take advantage of the known geometrical 
symmetries of the problem, by suitably choosing the 
proposals that control the geometrical parameters. It is 
therefore much more efficient to move "in circles" in the 
sky - subject to the constraints that we have discussed 
above - rather than to make proposals based on the 
distribution discussed in Section IIV Al We have indeed 
observed that custom-made proposals dramatically 
improve the performance of the algorithm, both in 
term of efficiency and accuracy. As the relative timing 
offsets between detectors provide the majority of the 
information about location on the sky, we therefore 
propose a fraction of new states of the MCMC chain 
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by keeping the time of arrival of the signal constant 
in each detector. In the case of a network of two 
detectors, this constrains the jump to a ring, centred 
on the vector between the two detectors, which we 
sample by applying a rotation matrix with uniform 
random angle between and 2tt to the position vector 
of the source (and accounting for the relative rotation 
between the earth-fixed detector co-ordinates and the 
sky-fixed source co-ordinates). In the case of three 
detectors, in order to keep the same time of arrival in 
all detectors, the source must be reflected in their plane. 
As a consequence, if x is the current and x' the proposed 
Cartesian unit vector to the source, and the detectors 
are located at the points xh, x*l, xv in the same 
co-ordinate system, with a normal to their plane n = 
[{x L - x H ) x (x v - x H )} I \ [(x L - x H ) x (x v - Xh)]\, 
the jump is therefore 

x' = x — 2n \n ■ (x — xh)\ ■ (37) 

In both cases, as the detectors are offset from the geocen- 
tre, and it is the time of arrival at the geocentre to which 
is used as a parameter, there is also an adjustment to be 
made to this parameter when moving to a new sky loca- 
tion, t' = to + %H • {% — %')■ Here, xh etc are measured 
in seconds. 

The custom proposals discussed here apply to any like- 
lihood exploration that involves short-lived (with respect 
to the time-scale of the Earth's rotation and orbit) bursts 
of gravitational waves, as it is simply connected to the 
relative time delays observed between different detectors. 

2. Differential Evolution 

When the limited prior distribution splits into multi- 
ple isolated islands of probability, see e.g. Figure the 
method of using the covariance matrix of the live points 
as a proposal distribution leads to less efficient sampling. 
In order to combat this effect, it is necessary to propose 
jumps which have a length scale characteristic of mov- 
ing between, or within the multiple modes. One possible 
technique is to analyse the current live points as belong- 
ing to a number of clusters, then propose new states from 
within these clusters, which is the approach adopted by 
the MultiNest algorithm [H[6l|]. 

In our implementation, we have introduced a new type 
of MCMC proposal which attempts to capture some of 
the structure of multi-modal distributions, based on a 
simple iteration of proposals inspired by Differential Evo- 
lution MCMC algorithms, see e.g. Ref. From the 
whole set of live points 6\, . . . , On, we select a random 
point, say 9 a that we want to evolve to point 9' a ] we then 
select two random existing points, say 9 b and 9 C , such 
that a 7^ b =/= c. The proposed new state is then given by 

d'a = $a + (9 C - 9 b ) , (38) 

which is accepted with the usual Hastings ratio, Eq. (f3"5l) . 
As the probability of drawing (6, c) for the random move 



is equal to that of drawing (c, &), the move is reversible, 
and therefore upholds the principle of detailed balance. 

When used for a fraction of the jumps (10% in our case, 
chosen through trial and error to explore adjacent modes 
while still maintaining good diffusion of points through 
the standard jumps), this type of move allows proposals 
to be made at all the characteristic scales between the 
different modes in a multi-modal distribution (as well as 
at the scale of the width of each mode) , and so increases 
the efficiency when such a distribution is encountered. 
Note that this type of proposal needs no scale to be set 
by the user, and is independent of the parameters used. 

B. Accuracy: quantifying errors 

Due to the probabilistic nature of the algorithm, when 
computing the Bayes factor there is an associated uncer- 
tainty with the result obtained by applying the nested 
sampling algorithm. As the evidence integral is written 

JVtot 

Z = J2 L ^ ( 39 ) 

i=i 

there is a Poissonian uncertainty arising from the variable 
number of iterations needed to find the region of high pos- 
terior probability, AT to t = NH ± %/ NH , which gives rise 
to an uncertainty in logZ of ±y/H/N . In [2l|, Skilling 
suggests that this error will dominate other sources of un- 
certainty, but makes the assumption that the sampling 
of the limited prior distribution is done perfectly. If the 
sampling is not done perfectly, for example if a small iso- 
lated mode is not properly sampled as there are no live 
points in its neighbourhood, an additional error will be 
introduced into the quantities Wi. Rather than attempt- 
ing to derive this quantity, we have performed Monte 
Carlo simulations on many sets of identical data and sig- 
nals, while changing the parameters of the algorithm TV 
and M , and examined how the distribution of estimated 
Z values changes. By doing this, we can also explore to 
which value N and M should be set to attain a given 
level of accuracy in the evidence computation. 

Figure [4] shows the theoretical \J H/N level of uncer- 
tainty, along with the actual standard deviation of the 
estimates of Z, over 50 trials, for a range of N and M. 
The actual distribution of recovered Z scales as ~ 1/ \f~N 
as theoretically predicted. It is how ever noticeably larger 
than that predicted by the ^JH/N error alone, and is de- 
pendent on both the number of live points and the num- 
ber of MCMC samples used. We can see that when N 
and M increase, the variance decreases, suggesting that 
there is an additional source of error related to the sam- 
pling of the limited prior distribution. This is not entirely 
surprising, as the limited prior distribution consists of a 
number of isolated islands, between which it is difficult 
to move with an MCMC sampler. If there is residual 
correlation in the MCMC chain used to sample the pa- 
rameter space, this will introduce an over-weighting of 
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FIG. 4: The statistical error (standard deviation) computed 
over 50 trials with different random seeds (identical signal and 
noise) plotted against the number of live points used, using 
M = 50, 100, 200, 500, 1000, and the theoretical prediction of 
the statistical error based on Skilling's estimate. These results 
show that the empirical error is greater than the theoretical 
one, indicating an additional source of uncertainty. This is 
greatly reduced with the number of MCMC samples used, 
with chains of 1000 samples (solid yellow line) approaching 
the theoretical limit (solid black line). This suggests that the 
extra uncertainty is produced by correlation between samples, 
caused by less than perfect mixing in the MCMC sampling of 
the limited prior. 
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FIG. 5: The mean number of likelihood evaluations performed 
in each of the trials shown in Figure [4] The number of likeli- 
hoods scales linearly with both the number of live points and 
the number of MCMC iterations used. 



the area of over-density of samples, which, depending on 
whether this is a region with higher or lower likelihood 
will cause an increase or decrease in the Z integral. As 
the correlation between start and end decreases with the 
length of the Markov chain, the added error decreases 
with increasing M. It is in fact clear that, at a given 
N, by increasing M the result approaches the theoretical 
error. 

By tuning N and M appropriately, we can therefore at- 
tain any desired level of accuracy, in principle, but at the 
expense of increasing the computational burden, as the 
number of likelihood evaluations is approximately pro- 
portional to the product NM. This can be seen clearly 
in Figure [SJ where we show the number of likelihood eval- 



TABLE I: Mass parameters of the injections used in testing. 



uations required to achieve the accuracies presented in 
Figure 01 The actual processing time then depends on 
the time taken to evaluate a single value of the likelihood 
function, which varies with the mass of the signal, and 
the length of data and sampling rate used. As an ex- 
ample, to compute the result with N = 500, M = 200, 
which gives an accuracy on log-B of ±0.8, ~ 3 x 10 6 
likelihoods were evaluated. In this case the algorithm 
took approximately 1 hour 20 mins to complete on a 2.4 
GHz Intel Xcon processor. Assuming that the overhead 
beyond the likelihood calculation is minimal, this gives 
an approximate time of 1.5ms per likelihood. The system 
used in these tests was a 30 Mq-1.4 M© binary system in- 
jected into 3 data streams, but the full parameter space 
(see Section IY Ap was explored so that templates across 
the whole low-mass range were generated. It should be 
noted that this number is mostly dependent on the time 
taken to generate the waveform, which is lowest for the 
stationary phase approximation templates used here, but 
is higher for time domain templates and those requiring 
the numerical solution of differential equations to pro- 
duce the waveform. 



V. RESULTS 

In this section we present a range of tests to demon- 
strate the effectiveness of a Bayesian approach in identify- 
ing gravitational- wave signals in the data from a network 
of interferometers and estimating the associated param- 
eters. We first investigate the detection efficiency of our 
algorithm, using log B as a "detection statistics" ; we then 
introduce and characterise a new test to discriminate a 
coherent gravitational- wave at the output of multiple in- 
struments from the presence of incoherent instrumental 
artefacts. We conclude by showing the impact of the 
number of instruments in the recovery of the source pa- 
rameters. 

For the tests presented in this Section, we have chosen 
four systems of different combination of masses for the 
values used in the injections, shown in Table [I] The first 
three systems lie near the corners of the irregular prior, 
shown in Figure [6] and discussed in Section lVAl while the 
fourth lies somewhat towards the middle. Using these 
systems, the performance of the algorithm across differ- 
ent signals lying in different parts of parameter space is 
assessed. 

Unless stated otherwise, we run all tests using three 
simulated interferometers operating as a network. These 
are located at the sites of LIGO Hanford, LIGO Liv- 
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FIG. 6: p{M,r]\H s , I), the prior probability distribution for 
the parameters M and r\. The white area is excluded by the 
boundary conditions imposed on the component masses (see 
section IV A|) , and the distribution given by equation 1431 



We also place a lower limit on the mass ratio, such that 
of 77 > 0.01, and on the chirp mass, M > 0.87, to en- 
sure that the waveforms generated are non-zero and of a 
length suitable for our analysis. These constraints result 
in a convoluted shape for the allowed regions of param- 
eter space in the (M,rj) plane, shown in Figure [5] The 
specific choice of the distance and mass priors is deter- 
mined by the need to ensure the accuracy of the integra- 
tion, which we now discuss. 

Within the core sampler, we have changed the variable 
used to logM, in order to reduce the range of the prior 
density, leading to better sampling of the space than us- 
ing M itself. The prior probability density function we 
use on log M is based on an approximation to the Jeffreys 

prior, p0\H, I) oc J&etT{ii?), where T(0) is the Fisher in- 
formation matrix, defined as 



ingston and Virgo (Cascina), and have simulated noise 
power spectral densities chosen to correspond to the de- 
sign sensitivics of each of these real instruments, as mod- 
elled by the appropriate LAL functions. The Gaussian 
and stationary coloured noise is then generated in the 
frequency domain, with the appropriate noise spectrum. 
A low-frequency cut-off of 50 Hz and Nyquist frequency 
of 1024 Hz are used throughout the analysis. 



A. Priors 

The choice of prior distribution p{6\H,I) is an im- 
portant factor in Bayesian inference, and will affect the 
Bayes factor as it is included in the evidence integral, 
Eq. ((4]). The prior effectively determines exactly which 
model is being used, by incorporating the ranges of the 
model parameters, and probability distribution on those 
parameters before the data are analysed. In the case 
of our implementation, the prior must be sampled using 
an MCMC technique, which will be more efficient when 
there is minimal structure in the chosen paramctcrisa- 
tion of the signal, in the sense that fewer iterations will 
be required to adequately sample it. 

For many of the model parameters, the choice of prior 
distribution is obvious: we use an isotropic distribution 
for the source sky location (a, S) and the direction of the 
orbital angular momentum (t, ijj) over the full range on 
the angular parameters of the model, reflecting total ig- 
norance and no reasons to prefer a particular geometry 
of a binary. We also choose a flat distribution on 4>q and 
to over the range < 4>o < 2w and a time interval of 100 
msec, respectively. For distance and masses, we use a 
uniform prior on logD^ in the range Dl E [1, 100] Mpc, 
a uniform prior in r\ and a prior on chirp mass of the 
form p(M\I) oc M^ 11 ^ 6 . As we desire to test our ap- 
proach on the mass region covered by the LIGO- Virgo 
low- mass searches for in-spiral signals [l2Lfl3| | . limits were 
imposed directly on the component masses, such that 
rri\^mi < 35M Q , where by our convention m\ > mi- 
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dfh{6) 



(40) 



This type of prior is used when there is no information 
about the parameters of the signal at all, and should 
therefore be invariant under a change of co-ordinates 
[72I [73| . For simplicity, in our initial implementation we 
ignore the correlations between the chirp mass and the 
other parameters and we just take the leading order New- 
tonian quadrupolc approximation of the in-spiral wave- 
form h(f) to compute the scaling of the prior. Under 
these assumptions we obtain 



dlogM 
and the prior is therefore 



(41) 



p(logM\I) oc 



\ 



8h(6) 



dlogM 

oc M~ 5/c \ 
p{M\I) oc X- 11 / 6 . 



dh{6) 



dlogM 



(42) 
(43) 



The Jeffreys prior is based on the notion that one 
should assign equal probabilities to equal volume ele- 
ments in the parameter space of the signal. Here the 
Fisher matrix is used as a metric, allowing us to calcu- 
late the volume element at each point. The Jeffreys prior 
encodes the fact that the volume element of a curved 
parameter space may vary with respect to the parame- 
tcrisation, and so the density of templates is greater at 
lower chirp masses. Failure to account for this will re- 
sult in an under-sampling of certain regions of parameter 
space, which will increase the chances of failing to de- 
tect a signal there (or increase the number of live points 
needed for a given probability of detection). This effect 
is most significant in the M parameter, which is why we 
have found it necessary to include it here, however a full 
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FIG. 7: The distribution of the log Bayes factor for 5 000 runs 
on synthetic Gaussian noise. The vertical solid line represents 
the 1% false alarm rate threshold of log Bs,n = 2.786. 



calculation of the Fisher metric in the M. , rj space would 
further improve the sampling, and is a goal for future 
development of this work. 

Note that the use of this prior, as with the one for dis- 
tance, ignores any available information about the mass 
distribution of neutron stars and black holes, and focusses 
simply on the detection of the signal with unknown pa- 
rameters. 



B. Detection efficiency 

In order to test the detection efficiency of our imple- 
mentation, we have chosen to treat the Bayes factor of 
the signal vs noise hypotheses 



B, 



S,N 



P{d\H s ,I) 
P{d\H N , I) 



(44) 



see Eq. © and Section III D[ as a detection statistic. 
By performing a large number of runs on a signal-free 
dataset, we can find the distribution of log i?s,N in the ab- 
sence of a signal, and therefore choose a threshold value 
which will give a certain false alarm rate. This is the 
same approach that we have adopted in Refs. [24j |. how- 
ever in this paper we consider coherent observations with 
multiple interferometers. Using this threshold, we can 
therefore decide whether or not the analysis of a data set 
which contains an actual injection yields a detection or 
not. To achieve this, we analysed 5000 different realisa- 
tions of Gaussian noise, and obtained the distribution of 
log £?s,n- This is shown as a histogram in FigureO where 
the vertical line represents the threshold log £?s.n = 2.786 
corresponding to a false alarm rate of 1% for any single 
trial. 

Taking this threshold, we then classify each result as 
detected if log-B > 2.786, and otherwise not. The de- 
tection efficiency is then assessed by performing 50 injec- 
tions of each test signal at varying signal-to-noise ratios 
(by changing the distance to the source) and determin- 
ing the fraction which are detected vs those which are 
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FIG. 8: Detection efficiency curve for the nested sampling 
algorithm for the four test systems as a function of the coher- 
ent network signal-to-noise ratio, with a detection threshold 
of log-Bs.N > 2.786, corresponding to a 1% false alarm rate, 
see Figure [7] Error bars indicate the 67% probability inter- 
val assuming a binomial distribution for the results of the 50 
trials. 



not. This allows us to build up a detection efficiency 
curve for each of the test systems, given the desired false 
alarm rate, which is shown in Figure |U From these re- 
sults, we can see that the 30 — 1.4 Mq, 15 — 15 Mq and 
25 — 5 Mq systems show consistency in their chance of 
being detected as a function of signal-to-noise ratio, fol- 
lowing a typical sigmoid curve with a transition zone be- 
tween signal-to-noise ratios 4 and 8 where there is an 
intermediate chance of detection, explained by the differ- 
ent noise realisations. Each of these curves crosses the 
50% detection efficiency at approximately signal-to-noise 
ratio of 6.5, and approaches 100% detection efficiency 
with a signal-to-noise ratio above 8. In contrast, the al- 
gorithm performs slightly more poorly in the detection of 
the binary neutron star system with 1.4 — 1.4 Mq com- 
ponent masses, with 50% detection efficiency at signal- 
to-noise ratio of 7.5 and a wider zone of transition. An 
examination of the raw Bayes factors output by the algo- 
rithm for this system indicated that the detected binary 
neutron star signals are allocated Bayes factors consis- 
tent with other signals of the same signal-to-noise ratio, 
but that there is a larger fraction of sources which are 
not detected, producing a Bayes factor consistent with 
noise. This may be due to the sampler failing to identify 
the correct region of parameter space, as the parameter 
volume of signals at low mass is considerably smaller, 
and therefore has a lower probability of being found by 
a probabilistic algorithm. 

It is suggested that improved performance could be 
obtained for these systems by incorporating the full met- 
ric into the calculation of density required in the A4 , rj 
subspacc, which would then distribute the samples more 
appropriately. However, the results broadly show that 
the algorithm is capable of correctly analysing and de- 
tecting signals at an SNR comparable to existing meth- 
ods. Other recent work has shown that prior distribu- 
tions may be found which balance the need for efficient 
detection with astrophysical prior information 53 1. 
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FIG. 9: Receiver Operation Characteristics curve for the 
nested sampling algorithm, obtained with 50 trials at each 
signal-to- noise ratio for the 30 M© — 1 M© test binary. 

Using the distribution of Bayes factors produced by 
the noise-only runs, we can also examine the relationship 
between false alarm rate and detection efficiency. For a 
choice of signal-to-noise ratio, the threshold of detection 
is varied, causing a change in the number of detected 
signals but also the number of false alarms, which can be 
plotted in a receiver operations curve (ROC), as shown 
in Figure [HJ In this figure, we have used system 1, see 
Table U with component masses 30 Mq and 1.4 Mq to 
produce the ROC curves for SNRs below, in and above 
the transition region. These results were produced with 
50 independent trials at each optimal SNR, and so the 
error bars shown represent the Poissonian error of 50 -1 / 2 . 

C. Coherent Analysis 

Every gravitational wave search critically relies on mul- 
tiple (at least two) instruments in order to make confident 
detections of astrophysical signals. Multiple interferom- 
eters are beneficial in two main ways. Firstly, the sig- 
nal in each detector can be cross-checked against those 
observed in the others, giving us an additional way of 
isolating a real gravitational wave which must yield con- 
sistent observations in all the instruments; in fact, the 
searches for short-lived gravitational waves employ one or 
more "detection confidence tests", see e.g. [75( and ref- 
erences therein. Secondly, if the interferometers are not 
co-located and aligned (as is the case of the ground-based 
network currently in operation) then they will see a dif- 
ferent projection of the strain tensor of the passing grav- 
itational wave; in turn, this allows a better estimation 
of the parameters of the incoming signal, and can break 
degeneracies between parameters present with only one 
data set. In particular, simultaneous observations with 
three or more instruments allows us to determine the 
geometry of the source, such as its distance and location 
in the sky. 

Using the mathematical and computational frame- 
work developed and described above, we can address 
both these points in a natural way. In this section we 
propose and demonstrate a new coherence test which 



can discriminate between coherent and incoherent sig- 
nals, making optimal use of the data and we demonstrate 
the improvement in parameter estimation when using a 
detector network. 



1. Coherence Test 

Having a network of detectors is essential to discrim- 
inate a real astrophysical signal from the background of 
spurious noise events which resemble gravitational wave 
signals, i.e. those with £?s,n 3> 1 in the language of 
model selection. We know that a real gravitational wave 
signal must be observed in all the detectors with compat- 
ible estimates for the physical parameters, and relative 
time delays that are consistent with the location of the 
instruments on Earth and a point on the celestial sphere. 
On the other hand, instrumental glitches will appear in- 
dependently in each detector. To be more specific, the 
observed data in each detector must be consistent with 
the physical gravitational wave: the different characteris- 
tics of each detector, including their instantaneous noise 
levels and orientations mean that the signal-to-noise ra- 
tios will vary between them. 

There will naturally be times when glitches occur si- 
multaneously in multiple detectors. When this happens, 
we can use the network in a coherent manner to test 
whether the event is consistent with a coherent gravita- 
tional wave, or more likely to be a gravitational- wave-like 
glitch occurring independently in each detector. Trans- 
lated into our framework of inference, we want to com- 
pare the two following hypotheses: 

• Coherent model, T-L co h'- The datasets d = 
{d^\S 2 \...,d^ N ^} from each of the N D de- 
tectors contain a coherent gravitational-wave sig- 
nal described by the same polarisation amplitudes 
h + (f; 9) and h x (/; 0) with the same parameters 8. 
The posterior probability of the coherent hypothe- 
sis is as before 

P(H coh \d)= P(Hc ° h) Z coh , (45) 
P(d) 

where the evidence is 

Z C oh = / P {9\H coh )p(d\e,n coh )d9 (46) 
Je 

and the joint likelihood of the observation d is 
p(d\9,n coh ) = l\? D P (d^\9,H coh ), sec Eq. (22). 

• Incoherent model, Jiinc- The data set at each in- 
strument, d^\ d^ 2 \ ...,d( No } contains indepen- 
dent gravitational-wave-like glitches, characterised 
by the parameters 9^\ 9^ 2 \ . . . , 9^ Nd \ in general 
different for each detector. In this case, assuming 
that the data and signals at each instrument are 
independent, the marginal likelihood of the model 
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factorises into the marginal likelihoods of each sig- 
nal in the relevant detector, and the posterior prob- 
ability is 

P(H inc \d) = -± 



P(d) 

P(d) i\ 



(47) 



where the evidence for the signal in each detector 
is 



Z (i) 



e<*> 



p(ft l) \H inc ) p(d« |0W , H iac )d^ ; (48) 



in the equation above p(d^\9^\'Hi nc ) is the like- 
lihood of the data set cfW at the i— th instrument 
output, characterised by the parameter vector #W 
defined over the space 

In order to distinguish between these possibilities, we 
need to compute the odds ratio, Eq. (J2), between the 
coherent and incoherent model 



_ P(H coh ) 

"coh.mc — r>(u \ ' 



coh,inc 



where the Bayes factor is 



B, 



coh.i 



nfzw 



(49) 



(50) 



see Eqs (|4"S")) and (|4T|) . Essentially the test computes the 
difference between the integral of the product and the 
product of the integrals for each datasct. The incoherent 
model used here might be regarded as the worst possi- 
ble type of glitch, in that it models a disturbance which 
appears exactly as a real gravitational wave would in a 
single detector. As usual the evaluation of the odds ratio 
requires to specify the prior odds, which is a subjective 
matter, and here we simply concentrate on the Bayes 
factor. 

Why does this work? If we think of the parameter 
space of the coherent model as being embedded in the 
larger parameter space {0^\O^ 2 \ . . . ,6^ Nd ^} of the in- 
coherent model, we see that it lies in the nine dimen- 
sional subspace where 6** (1) = 9^ = . . . = & Nd \ If a 
coherent signal is present, the distribution in the inco- 
herent space will be peaked on or near this subspace, 
which will intersect a relatively large total probability 
mass. As the coherent subspace has fewer dimensions, 
in this case Q Nd ~ 1 1 its prior is smaller and the model 
is more predictive. Loosely stated, this means that it 
will gain whenever its prediction is correct over the more 
general incoherent model. This is the so-called Occam 
factor which arises from comparing models with different 
predictive power. 

The larger space of the incoherent model will also cap- 
ture a greater amount of evidence from the data when a 
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FIG. 10: Bayes factor B C oh,inc between the coherent and in- 
coherent model, Eq. (|50l) with varying relative time delay be- 
tween the arrival time of a gravitational wave at different 
interferometers. In solid blue, the value of -B C oh,mc when the 
network of the LIGO-Hanford, LIGO-Livingston and Virgo 
instruments (HLV) was used, and in dashed red the network 
of the two LIGO interferometers (HL) . The time delay in mil- 
liseconds is applied equally between Hanford and Livingston, 
and Livingston and Virgo. The signal used had component 
masses 3Mq and 4Mq , and a coherent network signal to noise 
ratio of 17.8. The curve shows a strong fall in coherence prob- 
ability above a time-delay of « 3.5 msec, allowing us to rule 
out a single coherent signal. 



large glitch is present which causes a high likelihood in 
many parts of the parameter space, as any signal which 
is not completely paramctcriscd by the chosen parame- 
tcrisation will produce a broader peak on that manifold 
(such a glitch may provide a high evidence value when 
compared with the noise model only, but contains less 
information about the parameters). In this case (unless 
there is by chance a coherent glitch in the other detec- 
tors) the same argument will apply and cause the test to 
discriminate against the coherent signal model. 

Performing model selection with the Bayes factor, 
Eq. (|50[) , will then give us the optimal means of distin- 
guishing a coherent gravitational wave from incoherent 
glitches which have a significant component which looks 
like an in-spiral signal. This can be thought of as an en- 
hanced coincidence check, which uses all the parameters 
to check for consistency in the observed signals, and also 
incorporates a check for a better explanation as simulta- 
neous glitches. 

Here we provide two examples to test this technique. 
Firstly we consider how the test performs in the analysis 
of data sets that contain gravitational-wave signals that 
have identical parameters, but are characterised by an 
unphysical time-offset; in the second case we analyse the 
case in which a gravitational wave is present only in the 
data of one of the instruments of the network. 

For the first test, we inject a signal characterised by 
identical physical parameters - we choose Mi = 4M Q , 
AI-2 = 3 Mq and random position, orientation and dis- 
tance such that the network optimal SNR is 17.8 in the 
case of a coherent injection - into the simulated data 



17 



streams of LIGO-Hanford, LIGO-Livingston and Virgo; 
we first perform the injection coherently in the data of 
the three interferometers, and then we repeat it by intro- 
ducing a non-physical time shift AT < 10 ms in the time 
of coalescence at the different sites (the coherent injec- 
tion case corresponds therefore to AT = 0). The noise 
realisation is identical in each case; however, due to the 
slightly offset time of arrival of the signals and therefore 
the slightly different sum of data and signal, there is some 
spread in the recovered evidences. We compute -B C oh,inc, 
Eq. (|50")l as a function of AT, considering the case in 
which the analysis is carried out using the two LIGO in- 
struments (HL-network) and the three-instrument (HLV- 
network) . The results are shown in Figure I10[ where we 
plot the Bayes factor against the time shift AT. As ex- 
pected, when the signal is injected with AT = the evi- 
dence favours the coherent model strongly, by a factor of 
~ 36 461 in the case of the HLV-network and ~ 328 in the 
HL-network. As the time shift AT increases (the light- 
travel-time between Hanford and Livigston is « 10 msec), 
the evidence switches to favouring the incoherent model 
at around 4 msec in both HLV and HL networks, and 
rapidly decreases to strongly exclude the coherent model 
with a Bayes factor of < 10 -10 . With a large separation 
between the signals, the incoherent model becomes very 
strongly preferred, by a factor up to ~ 2 x 10 22 in the 
case of the HLV and ~ lx 10 24 in the case of the HL net- 
work. One would naively assume that HLV would yield 
more stringent rejection than HL, which is not the case 
here We note that for AT ,$3.5 msec the coherent model 
is still favoured, as the probability distribution still con- 
tains a sufficient probability of intersecting the coherent 
signal manifold. Although this plot is merely represen- 
tative of a single test of coherence and particular details 
will vary with the signals and datasets, we have specifi- 
cally used identical signals so that the PDF should peak 
at the same value in all but the time parameter. This 
should correspond to a situation where it is extremely 
difficult to determine whether or not the multiple signals 
are incoherent or coherent and provide a challenging test 
of the method. 

We consider now a second test of this method. A situ- 
ation which commonly arises is the presence of a "glitch" 
or instrumental artefact in a single interferometer which 
is not present in the others in the network. This situation 
is handled in the case of an incoherent search by check- 
ing that corresponding triggers, with consistent physical 
parameters, exist in all interferometers. As a useful san- 
ity check for the coherence test, we examined the case in 
which the "glitch" has exactly the functional form of a 
gravitational wave from an in-spiral signal, but is present 
only in one instrument. In this case, the Bayes factor of 
the coherent model against the Gaussian noise model is 
elevated, however our coherence test should allow us to 
exclude an event such as this due to the lack of a consis- 
tent signal in the other detectors. 

Table [TT1 displays the detailed results of performing the 
coherent and incoherent analyses on a signal with an SNR 
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H 




-5950.45 




45.78 




L 




-6106.47 




0.39 




V 




-6059.44 




-0.80 




HL 


-12056.93 


-12058.00 


46.16 


45.09 


-1.07 


HLV 


-18116.37 


-18123.29 


45.36 


7.52 


-6.92 



TABLE II: Results of performing the coherence test on a sig- 
nal injected only into the LIGO-Hanford simulated data set. 
The test successfully rules out a signal which does not ap- 
pear in more than one detector, despite the coherent signal 
vs. noise comparison (Bs,n) still favouring the signal model 
for HLV and HL observations. 



of 9.8 injected only into LIGO-Hanford (H) simulated 
data. It is notable that even using the coherent model, 
the signal causes an elevated Bayes factor to be found, as 
there is some set of parameters which give a compromise 
signal in the network of detectors. This gives us the un- 
desirable situation where Bs^n, the Bayes factor of signal 
against Gaussian noise, Eq. (|44[) . would be triggered by 
an event in a single detector. However, by performing 
the coherence test, we can see that the incoherent model 
is favoured by a factor sa 10 3 to the coherent model, indi- 
cating that this situation is unlikely to be a true coherent 
gravitational wave, and so it can be safely ruled out. 

This also works in the case of a two-detector network, 
although in this case there is a much stronger possibility 
that a signal may be observed in only one of the two 
detectors, and so the corresponding analysis infers that 
there is only a 2.9 times greater chance of the incoherent 
model than the coherent one. 

This test of coherence gives us a powerful means of dis- 
tinguishing coherent from incoherent events, which can 
be used to quantify the additional confidence that we 
achieve through the use of a network. This result fol- 
lows naturally from the precise statement of the hypothe- 
ses using the conceptual and computational framework 
that we have developed and demonstrates the power that 
Bayesian methodology can bring. 



2. Parameter resolution 

We have shown in Section fill Bl that from the output 
of the nested sampling algorithm for the evidence/Bayes 
factor computation one can construct at no additional 
computational costs the marginalised posterior PDFs on 
the unknown source parameters. Here we show an exam- 
ple of the evaluation of such PDFs with nested sampling 
as a function of the number of instruments in the net- 
work, applied to the coherent observations of an in-spiral 
binary signal. We consider a system with an optimal 
signal-to-noise ratio of 9.3, 12.8 and 14.4 respectively in 
the simulated network configurations of Hanford only, 
Hanford-Livingston and Hanford-Livingston- Virgo. The 
actual values of the parameters used for the injection are 
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FIG. 11: The one-dimensional marginalised posterior probability density functions of the nine parameters that describe a 
gravitational wave in-spiral signal from a circular binary of non-spinning compact objects. The plots show the effect of the 
coherent network analysis on the estimation of the parameters. The true values of each parameter is indicated by a vertical, black 
line. The signal was injected in simulated Gaussian and stationary noise representing the LIGO-Hanford, LIGO-Livingston and 
Virgo instruments, with an optimal signal-to-noise ratio of 9.3 (H), 12.8 (HL) and 14.4 (HLV), respectively. As more detectors 
are added to the network, the parameters of the signal become better constrained, with three detectors being necessary to fully 
resolve all the signal parameters. Lines represent kernel density estimates of the parameters, based on the samples from the 
PDF generated as in section [III Bl Edge effects from the smoothing function are responsible for lowered density estimates near 
the edges in the distributions of <j>o and rj. The density estimation was performed using the Matlab function ksdensity. 



shown by the black vertical lines in Figure QTJ 

Each instrument measures essentially two independent 
quantities - an amplitude and a phase - as a function of 
time. As the duration of an in-spiral is negligible in com- 
parison to the period of rotation of the Earth, there is 
no observable evolution of the antennae response func- 
tions during the period of observation (from which one 
would otherwise reconstruct the source location in the 
sky). From the signal strain and the time of arrival of 
the gravitational- wave burst, one must infer the parame- 
ters M., r), to, Dl, oe, 6, ip and t. The chirp mass M. (and 
to lesser extent the symmetric mass ratio rj) determines 
the phase evolution of the signal, which provides a large 
amount of information to constrain this parameter inde- 
pendent of the others listed here. However, in the remain- 
ing parameters a large degree of degeneracy is present, 
producing correlated joint posterior PDFs. For the case 
of observations with one interferometer, this manifests 
as a broad distribution in the one dimensional marginal 
PDFs shown in Figure 1111 When there is insufficient in- 
formation available to determine these parameters, the 
posterior PDFs can be influenced more strongly by the 



prior distribution, as it is the case in particular for Dl, 
a, 6, ip and L. 

With the addition of a second, independent detector 
at a geographically different location, the possible sky 
locations and times of arrival are strongly constrained to 
lie on the surface described in Section HV All but there 
is still substantial uncertainty in the marginal distribu- 
tions for the chosen parameters. This is finally broken 
when a third detector is added to the network (red line 
in Figure [TTj) , which allows the sky location to be deter- 
mined uniquely, along with the remaining parameters. 
In this figure we can see the evolution of the posterior 
PDFs with additional observations, as described above. 
For this injection, an inclination was used which placed 
the orbital plane of the binary almost edge-on to the line 
of sight to the Earth, meaning that only one polarisation 
was detectable. This is reflected in the under-cstimation 
of the distance to the system for a network of less than 
3 detectors, as there are many positions which are not 
edge-on which lead to a higher overall observed signal 
amplitude, and therefore must be located farther away. 

This provides an example of the necessity of multiple 
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detectors, operating as a network, if we are to make full 
use of the astrophysical information carried by gravita- 
tional waves. By undcrdctermining the parameters of 
the signal, biases or at least additional uncertainties may 
be introduced into our conclusions about the nature of 
observed sources. 



VI. CONCLUSIONS 

By taking a Bayesian approach to the analysis of data 
for the detection and characterisation of in-spiral signals, 
we have been able to implement a conceptually simple 
yet flexible framework for drawing inference from obser- 
vations. Although the Bayesian formalism calls for the 
evaluation and integration of high-dimensional likelihood 
functions, we have shown that the nested sampling tech- 
nique provides us with a means to both search for and es- 
timate the parameters of a signal. Further work remains 
to be done in improving the efficiency and reliability of 
detection at low masses, but the particular implementa- 
tion that we describe here provides a solid basis for these 
future improvements. 

We have used our implementation to demonstrate the 
power of Bayesian model selection in classifying putative 
gravitational wave signals, through the use of the coher- 
ence test described in Section IV CI The coherence test 
goes some way to implementing a robust and Bayesian 
defence against glitches present in gravitational wave 
data which do not resemble coherent gravitational waves 
by providing an internal consistency check within the de- 
tector network. Tests of this kind may provide a useful 
new additional discriminator when analysing candidate 
gravitational waves, and are only achievable through the 
treatment of the detector network in a coherent way. No- 
tably, this test is only possible through the use of the 
Occam factor and the comparison of probability distribu- 
tions which differ in their dimensionality, and could not 
be possible with point estimates of maximum likelihood. 
Indeed, the maximum likelihood of the coherent and in- 
coherent models can be trivially shown to be identical. In 
addition to such tests, it would also be desirable to model 
the detector data in a way which is non-stationary or 
Gaussian, but this is beyond the scope of this work. Fur- 
thermore, the use of the Bayesian parameter estimation 
framework is invaluable in inferring the signal parame- 
ters, and produces the full posterior probability distribu- 
tion, not just maximum likelihood points and estimates 
of variance. The covariance and interdependence of the 
parameters does not prevent us from calculating consis- 
tent joint PDFs on the full parameter space even when 
point estimates have little meaning, and this allows us 
to see the benefit of using a network of detectors in a 
coherent fashion. 

One of the main benefits in the use of the nested sam- 
pling and Bayesian evidence approach is the ease with 
which it can be extended or adapted to different signal 
models with minimal changes needed to the implemen- 



tation. In other work past and ongoing, we have shown 
how this method may be applied to the discrimination 
between candidate waveforms when comparing to a nu- 
merical relativity simulation; how one may place bounds 
on the Compton wavelength of the graviton given a sin- 
gle or multiple observations of an in-spiral signal, and 
testing the effects of including spins in the detection and 
estimation of in-spiral signals (27l . [76j . 

Future work on the implementation of the core algo- 
rithm will focus on achieving a better reliability and ef- 
ficiency in the sampling of the parameter space, which 
should lead to further improvements in the performance 
in a real world situation, and on testing on the full set of 
waveform approximants used in present searches for co- 
alescing binary systems. The work presented here forms 
the basis of the lalapps_inspnest program, which is 
can be found in the LALApps software distribution, re- 
leased under the terms of the GNU General Public Li- 
cence [30l | . 
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Appendix A: Definitions and conventions 

We provide the definitions and conventions that we 
have adopted to link the continuous representation of a 
time series which is used in the main body of the paper 
with the discrete representation. The latter is what is 
actually used for applications, and corresponds to the ex- 
pressions implemented in the lalapps_inspnest soft- 
ware. This appendix provides also a mapping of the no- 
tation used in our previous papers pil l2q and the one 
used here. 

Consider a time series a(t) sampled at time intervals 
At = l/(2/N y ), where /n y is the Nyquist frequency, for a 
total observation time T. The total number of data sam- 
ples is therefore N p = T/At = 2T/n y . Given a generic 
real function a(t), our conventions for the Fourier Trans- 
form are |8lj |: 

/+00 
a^e-^dt, (Al) 
-OO 
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and the inverse transform is 



O(t) 



a(f)e 



+2nift 



df. 



(A2) 



The data points at time tj and frequency f k arc therefore: 



a(tj) = a(jAt) = cij 
a(fk) = h(k/T) = Atxa k 



where we have defined the Fourier series as: 
a k = *£ aj e-™W N >, 

,+2-n-ijk/Np 



a, 



— V 



a k e 



(A3) 
(A4) 



(A5) 
(A6) 



In the following we will indicate with a(/&) the 
(dimension-full) approximation to the Fourier Transform 
of a(/) at frequency / = f k . 

We consider now the statistical properties of the noise. 
The one-sided noise spectral density S(f) is defined in 
the continuous case as: 



S(f) = 2 
which yields 



(n(t + T)n(T))e- l2 * :ft dt. 



(A7) 



If one defines 



K |2\ 



° k = (l»fc 

Ct = {\z k \ 2 ) = (\y k \ 2 ) 

4 = Kl (A13) 

the variance of the (complex) noise and the real and imag- 
inary part of the Fourier series elements, then 



S(fk) = 2 



Af 
T 



-erf. = 2- 



'if' 



>k ■ 



4 — Ct ■ 



(A14) 



Let us now considered the usual inner product (.|.) [59| 
between two (real) functions a and b and its approxima- 
tion in the finite case: 



(a\b) 



2 rmnn±pim df (A15) 

Jo 



S(f) 



2 ^ a(/ fc )b*(/ fc ) + a*(/ fc )b(/ fe ) (Alg) 



S(fk) 



E 

fc>0 



UkK + a* k b k 



(A17) 



<»(/)»•(/')> = ^S(f)5(f-n = SW(f)8(f-f) , (A8) 

where the factor 1/2 comes from the fact that S(f) is 
the one-sided noise spectral density; this is related to the 
two-sided noise spectral density S^ 2 ' (/) by 



S {2) (f) = ^S(f). 



(A9) 



If we explicitly write the real and imaginary part of 
the noise contribution in the Fourier domain, which is 
the notation adopted in [U, [25| as 



n(fk) = x(fk) + iy(fk) , 
and we substitute into Eq. (|A8|) . we obtain: 



(A10) 



The optimal signal-to-noise ratio is just the square root 
of the norm of h, and using Eq. (|A17|) it yields: 



(h\h) 



T 5-^ 



\Uf)? 
s(f) 

mm* 



df, 



\~a k \ 2 



k>0 * 



(A18) 



The likelihood function for the data set d given the model 
h, Eq. (|22p therefore becomes 



<K/ fc )i 2 ) = mfk)? 



(AH) 



where the first equality comes from the fact that x(f k ) 
and y(fk) are independent. In terms of the elements of 
the Fourier scries one has: 



At 2 (K| 2 ) = At 2 [(\x k \ 2 ) + (\y k \ 2 )] 



(A12) 



p(d\h,Hs) oc exp 



oc exp 



(d-h\d-h) 



\d(fk)-Hfk)\' 



k>0 



S(fk) 



(A19) 



(A20) 



and this is the expression used in the software implemen- 
tation through Eqs. (TA4]) . and (|A14|) . 
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